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Acute lymphoblastic leukemia (ALL) has recently been one of the most 
significant concerns in cancers, especially child and old age. Therefore, 
crying needs to diagnose leukemia as early as possible, increasing the 
treatment options and patient survivability. Some basic handicraft leukemia 
detection processes have been introduced in this arena though these are not 
so accurate and efficient. The proposed approach has been introduced an 
automated ALL recognition system from the peripheral blood smear. 
Initially, the color threshold has been applied to segment lymphocytes blood 
cells from the blood smear. Some post-processing techniques like 
morphological operation and watershed have been executed to segment the 
particular lymphocytes cell. Finally, we used a support vector machine 
(SVM) classifier to classify the cancerous image frames using a statistical 
feature vector obtained from the segmented image. The proposed framework 
has achieved the highest accuracy of 99.21%, the sensitivity of 98.45%, 
specificity of 99%, the precision of 99%, and F1 score of 99.1%, which has 
beat existing and common states of art methods. We are confident that the 
proposed approach will positively impact the ALL detection arena. 
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1. INTRODUCTION 


The diagnosis of blood cancer is usually considered to be one of the biggest challenges in the 
healthcare industry as it relies on the hematologist’s ability to detect it for a long time. But identifying cancer 
as soon as possible is very critical for faster responses and improved care options. Computer-aided diagnosis 
has been introduced to minimize the physician’s burden and propeller of data overloading [1]. The primary 
purpose of computer-aided diagnosis is to detect the abnormality as soon as possible, which is sometimes 
impossible for the physician manually [2]. Escalante et al. [3] have developed an ensemble particle 
swarm-based model in digitized bone marrow images to recognize acute leukemia particles. This model has 
achieved better performance for detecting leukemia than other manual procedures. It has reached 97.68% for 
the binary classification and 94.21% for the multi-categorical classifications [3]. Rawat et al. [4] have 
introduced an intelligent diagnosis system for finding blood cancer which depends on gray level 
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co-occurrence matrix (GLCM) and graphical features. Support vector machine (SVM) has been examined to 
classify the normal and abnormal images from the cancer dataset. The model has obtained 89.8% 
classification accuracy by extracting texture-shape features [4]. In the study, Putzu et al. [5] have compared 
the acute lymphoblastic leukemia (ALL) detection result by examining different classifiers. Their comparison 
study’s highest 93% accuracy with 98% sensitivity has been noted by applying a Gaussian radial 
kernel-based SVM classifier [5]. Mohapatra et al. [6] proposed a cold agglutinin disease (CAD) leukemia 
diagnostic model based on a fuzzy-segmentation technique and achieved an accuracy of 93% by applying an 
SVM classifier. Also, normal and blast cells have been classified using an SVM classifier through shape and 
color features, which has achieved 93.7% accuracy, 92% sensitivity, and 91% specificity in the acute 
lymphoblastic leukemia-international database-1 (ALL-IDB-1) dataset [7]. Again in study [8], an automated 
acute lymphoblastic leukemia detection model has been tested on blood smear images to determine the 
abnormalities. 

A decision tree classifier has been applied to get the optimal result to classify and achieve an 
accuracy of 96.25%, sensitivity of 97.3%, and specificity of 95.35%. Techniques including fluorescence in 
situ hybridization (FISH), cytochemistry, cytogenetic synthesis, and immunophenotyping have also been 
introduced to classify leukemia which is very time-consuming [9]. The proposed system [10] can 
differentiate the leukemia cells containing healthy lymphocytes from the ALL-IDB-1 dataset. Therefore, 
actual feature extraction of leukemia and typical images is challenging [11], [12]. Shafique et al. [13] have 
proposed color features with an SVM classifier and achieved 93.7% accuracy with low sensitivity and 
specificity. Tuba et al. [14] has introduced shape and texture features and classified leukemia using an SVM 
classifier. Iterative distance transform has been used to segment the leukemia circles in [15]. In contrast, Al 
Mamun et al. [16] have used fuzzy logic to segment the bleeding portion. Many color threshold segmentation 
approaches have been applied to differentiate the particular abnormalities from the normal conditions 
[17]-[20] though these have lower accuracy. 

The proposed approach has introduced an efficient automated ALL detection framework from the 
ALL-IDB dataset. The color threshold approach has been applied to extract the detailed information, and a 
feature vector has been created by calculating some statistical features. In this article, leukemia detection has 
been implemented with decent accuracy by using an SVM classifier. 


2. RESEARCH METHOD 

This proposed model has introduced an automated ALL recognition system from the peripheral 
blood smear. A complete workflow diagram of the proposed method has been depicted in Figure 1. Initially, 
the color threshold has been applied to segment lymphocytes blood cells from the blood smear. Some 
post-processing techniques like morphological operation and watershed have been executed to segment the 
particular lymphocytes cell. Finally, SVM has been implemented on the feature vector to categorize the 
cancerous image. A Linux-based environment with version 18.04 bearing having GTX 1080Ti GPU has been 
used to train and test the proposed model. 


Dataset m Color threshold m Post processing 
SVM classificer — SVM classificer — Feature extraction 


Figure 1. Schematic diagram of the proposed leukemia determination method 


2.1. Dataset 

Images from ALL-IDB, a publicly available dataset, have been used in this research work [21]. 
There are 108 image frames in which it has two categorical images like non-cancerous and leukemia. There 
are 49 image frames of leukemia and 59 image frames of non-cancerous. The resolution of healthy image 
frames is 2592x1944 and 1712x1368 for leukemia. The sample image frames for healthy and cancerous cells 
have been shown in Figures 2(a) and 2(b). 
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(a) 


Figure 2. ALL-IDB sample image frames (a) normal cell and (b) abnormal cell 


2.2. Color threshold 

The images in the dataset are in the red, green, blue (RGB) color space, and the color intensity for 
both categories differ from each other. Hence, it is challenging to extract the ALL cells from RGB color 
space images. The image frames have been converted into YCbCr color space to outperform the intensity of 
the cancerous and non-cancerous images. A color threshold has been applied to create the mask to extract the 
informative portion (leukemia) from the image frames. This color mask has segmented the leukemia cells. 
Also, the non-informative part, excluding leukemia, has been trunked at the same stage, which has been 
shown in Figures 3(a) and 3(b). 


(a) 


Figure 3. Segmented image by applying color threshold (a) normal cell and (b) abnormal cell 


(b) 


2.3. Post-processing 

The output images from the segmented stage have many noises and unwanted information. These 
noises may mislead the result of recognition. To suppress the element overflows, a multidimensional image 
filter has been implemented. Besides, it can compare the value with the target. Watershed transform has been 
applied, which has reverted with a leveled matrix that differentiates that water region in the image. It detects 
the watershed ridgelines by assuming the image as the surface where slight pixels mean high superiority and 
reverse for the dark pixels. 

Eventually, the relevant section of the images has been extracted and separated completely from the 
image background and any other irrelevant objects. This is an efficient technique to spot the object of interest 
from the image background as well as from other unwanted objects [22]. Having completed the separation of 
the information from image background and other objects, we have implemented the morphological 
technique. This technique applies structural elements from strewing, which resizes all the images into the 
same dimension. During this implementation, each output image pixel is compared with the corresponding 
pixel of the given image associated with the neighbors [23]. We have formulated a susceptible morphological 
operation on the given data by appropriately selecting the neighborhood shape and size. In this research 
work, dilation and erosion have been used with the structural element from strewing. The erosion and dilation 
were appended and eliminated the pixels of the corresponding object boundaries. The pixel dimension is 
adjusted depending on shape and size of the structural elements. Finally, after having the morphological 
operation completed, we got the most suitable informative section from the image. The final segmentation of 
the images is shown in Figure 4. 
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Figure 4. Final segmented image of leukemia 


2.4. Feature extraction 

Feature extraction is a significant step before implementing machine learning-based classification 
on a dataset. This process is one way of dimension reduction that can efficiently represent a particular section 
of the image in a vector of features. Our proposed technique has used distinct statistical features, including 
mean, variance, mode, form factor, eccentricity, entropy, solidity, elongation, compactness, and 
rectangularity to extract various features and hence to help in machine computation from the segmented 
section of the image. Since YCbCr color mode has provided the best classification accuracy, feature 
extraction is recommended in YCbCr color mode for this proposed model. 


2.5. Classification 

SVM classification is a superior classification model that can also be utilized for classification and 
regression. Generally, the SVM classifier is a superior method for examining two classes. Recognition of 
bleeding portion is also a problem of two classes. The SVM classifier provides a training vector which is 
called a support vector, and it is supposed to be most significant for constructing the decision level. Let a 
learning data value pi = p;(n),n = 1, ..., N is referred to the class level which carries a number of the color 
feature of M images. The learning data values are a feature vector of the S dimension. To get the feature 
vector, the function f (p) = f(k, p), k= [k1 k2........k,]" is employed for highly matching the class level yi. 
The learning vector pi will convince the provided equation with +1 and -1 for leveling two classes. 


_ Hi +b2>+41, positive p; 
'Uklp,+b<-1, negative p; 


The discriminant analysis vector can be defined by the following equation by assuming the seed function 
k(x, y), and experimental vector c. 


f(x) = Lil, cikli x) + b 


SVM classifier can be made more effective in some cases by adapting non-linear seed function [24]. 


3. RESULTS AND DISCUSSION 

Like other classification models, leukemia diagnosis also faced four possible outcomes, which are 
listed: i) true positive (TP), ii) false positive (FP), iii) true negative aka (TN), and iv) false negative (FN). For 
this type of classification model, the accuracy metric itself can’t justify the method’s reliability. So, other 
relevant parameters, including sensitivity, precision, specificity, F1 score, have been considered to justify the 
performance of the proposed method. The aforementioned method has recorded the best accuracy of 99.21%, 
the sensitivity of 98.45%, the specificity value of 99%, the precision value of 99%, and the F1 score of 
99.1%, depicted in Figure 5. 

Besides, we have tried four different color spaces for developing this model to inspect if color space 
has any significance on the performance. Figure 6 gives a comparison of the performance in different color 
spaces, including RGB, hue, saturation, value (HSV), YCbCr, and L*a*b*. From the Figure 6, it is clear that 
the YCbCr color mode gives the best result in comparison to other color spaces. 

Furthermore, the classifier that classifies the images was varied to get the best one from four 
prospective algorithms. Figure 7 depicts the performance of four classifiers that have been used including 
quadratic discriminant analysis (QDA), fine Gaussian support vector machine, linear discriminant analysis 
(LDA), and logistic regression. After analyzing the performance metrics, it is noticeable that SVM is the best 
classifier for leukemia detection. 
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Figure 5. The performance of the proposed leukemia detection 
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Figure 6. The comparison results in different color space 
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Figure 7. The comparison results in different classifiers 
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Even though much research is going on leukemia determination, our proposed method is unique and 
supposed to contribute to this research domain. A comparison between the existing method and our proposed 
method is shown in Table 1. From this table, we can notice that the proposed method beats the existing 
method in terms of accuracy, sensitivity, and specificity. Besides, this research work has also analyzed a few 
other relevant performance metrics, including precision, negative-predicted value, and F1 score. In a word, 
this proposed technique is a more accurate and reliable model for leukemia detection. 


Table 1. A comparison study on different existing methods 


Methods Classifier Acc (%) Sen (%) Spe (%) 
Putzu et al. [5] SVM 93 98 - 
Li et al. [7] SVM 93.3 92 91 
El Houby [8] Decision tree 96.25 97.3 95.35 
Umamaheswari and Geetha [25] KNN 96.25 95 97 
Proposed method SVM 99.21 98.45 99 


4. CONCLUSION 

ALL is one of the fatal diseases at present. An accurate and reliable diagnostic method of ALL is the 
prerequisite of successful ALL is a treatment. The major concern of the proposed method is to diagnose ALL 
more accurately and efficiently. Consequently, the proposed approach has introduced an efficient automated 
ALL detection framework from the ALL-IDB dataset. The color threshold approach has been applied to 
extract the particular information, and a feature vector has been created by calculating some statistical 
features. Here, SVM based machine learning classifier has been employed to classify leukemia from the 
normalities. The proposed framework has achieved the highest accuracy of 99.21%, sensitivity of 98.45%, 
specificity of 99%, precision of 99%, and F1 score of 99.1%, which outperforms some existing research. The 
result can be made robust by examining different advanced classifiers and features. 
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